Correlation Clustering for Crosslingual Link Detection
نویسندگان
چکیده
The crosslingual link detection problem calls for identifying news articles in multiple languages that report on the same news event. This paper presents a novel approach based on constrained clustering. We discuss a general way for constrained clustering using a recent, graph-based clustering framework called correlation clustering. We introduce a correlation clustering implementation that features linear program chunking to allow processing larger datasets. We show how to apply the correlation clustering algorithm to the crosslingual link detection problem and present experimental results that show correlation clustering improves upon the hierarchical clustering approaches commonly used in link detection, and, hierarchical clustering approaches that take constraints into account.
منابع مشابه
Monolingual and Crosslingual Plagiarism Detection
Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of documents in order to relate the plagiarised fragments to their potential source. The suspicious and source documents can be written wether in the same language (monolingual) or in different languages (crosslingual). In the context of the Ph. D., our work has been focused on both monolingual and...
متن کاملCrosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering
The paper describes our ongoing work on crosslingual speech recognition based on multilingual triphone hidden Markov models. Multilingual acoustic models were built using two different clustering procedures: agglomerative triphone clustering and tree-based triphone clustering. The agglomerative clustering procedure is based on measuring the similarity of triphones on a phoneme level where the m...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملDetection of lung cancer using CT images based on novel PSO clustering
Lung cancer is one of the most dangerous diseases that cause a large number of deaths. Early detection and analysis can be very helpful for successful treatment. Image segmentation plays a key role in the early detection and diagnosis of lung cancer. K-means algorithm and classic PSO clustering are the most common methods for segmentation that have poor outputs. In t...
متن کاملA Correlation Clustering Approach to Link Classification in Signed Networks
Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new...
متن کامل